CD8 TIL projection uncover cells types hidden by transient gene programs
Examples on Activation and Cell Cycle gene programs
Background
Cells responding to external cues can go into transient gene programs. In scRNA-seq data, clusters will form by the strongest variation component, which can be transient gene programs, and might not reflect the original, potentially more stable cell type.
Among the main types of program that we see being upregulated are:
Cell cycle, when cells go through division (eg. MCM5, TOP2A)
IFN response, in response to IFN stimulation (eg. MX1, ISG15)
HSP response, in response to stress (eg. HSPA1, HSPA2)
Activation response, in response to activation signal, eg TCR engagement (eg. JUN, FOS)
In two example, we will see how we can use reference-based projection to uncover the original cell types from activated and cycling cells.
First example: Activation program Yost et al. 2019
In Yost et al. 2019, the authors describe clonal replacement of tumor-specific T cells following PD-1 blockade using scRNA-seq data. The original annotation comprised a mixture of cell types (Naive, Memory etc) and cell states (Activated T, Activated / Exhausted).
Second example: Cycling cells program Gueguen et al. 2021
In Gueguen et al. 2021, the authors describe two population of cycling cells: CD4/8-MCM5 and CD4/8-TOP2A. As these cycling cells are a mix of CD4 and CD8 T cells, we will see first how we can isolate only CD8 from this mix of CD8/CD4 using the built-in scGate filtering included in the CD8 reference. We will then see if our reference-based approach ProjecTILs can help recover the original cell types annotation.
Loading the CD8 reference
First let’s load the CD8 TIL refence by downloading it from Figshare.
# Load the reference
options(timeout = max(900, getOption("timeout")))
#download.file("https://figshare.com/ndownloader/files/38921366", destfile = "CD8T_human_ref_v1.rds")
ref.cd8 <- load.reference.map("CD8T_human_ref_v1.rds")## [1] "Loading Custom Reference Atlas..."
## [1] "Loaded Custom Reference map Human CD8 TILs"
mycols <- ref.cd8@misc$atlas.palette
# Plotting
DimPlot(ref.cd8, group.by = 'functional.cluster', label = T, repel = T, cols = mycols) + theme(aspect.ratio = 1)We can see that the reference is composed only of what should only be stable cell types. Indeed, cells expressing highly transient gene programs (Cycling and IFN) were removed when constructing the reference.
First example: Activation program Yost et al. 2019
Load data
First we need to load the data and append metadata from the original study, including UMAP coordinates.
# Load data
datadir <- "input/Yost/"
cached.object <- paste0(datadir,"Yost.seurat.rds")
if (!file.exists(cached.object)) {
library(GEOquery)
geo_acc <- "GSE123813"
gse <- getGEO(geo_acc)
series <- paste0(geo_acc, "_series_matrix.txt.gz")
system(paste0("mkdir -p ", datadir))
getGEOSuppFiles(geo_acc, baseDir = datadir)
## Load expression matrix and metadata
exp.mat <- read.delim(sprintf("%s/%s/GSE123813_bcc_scRNA_counts.txt.gz",
datadir, geo_acc), header = F, sep = "\t")
query.object <- CreateSeuratObject(counts = exp.mat, project = "Yost", min.cells = 3, min.features = 50)
meta <- read.delim(sprintf("%s/%s/GSE123813_bcc_all_metadata.txt.gz", datadir, geo_acc), header = T, sep = "\t")
meta.T <- read.delim(sprintf("%s/%s/GSE123813_bcc_tcell_metadata.txt.gz", datadir, geo_acc), header = T, sep = "\t")
response <- read.csv(meta_response, header=T, sep=",")
query.object <- AddMetaData(query.object, meta$patient, col.name="patient")
rownames(meta) <- meta$cell.id
rownames(response) <- response$Patient
meta.T.c <- meta.T$cluster
names(meta.T.c) <- meta.T$cell.id
query.object <- AddMetaData(query.object, meta)
query.object <- AddMetaData(query.object, meta.T.c, col.name = "cluster")
query.object@meta.data$Response <- response$Response[match(query.object$patient, response$Patient)]
saveRDS(query.object, cached_seurat)
} else {
query.object <- readRDS(cached.object)
}Now we can have a look at the original study annotation, including
clusters. We can see that activation cluster (CD8_act)
is patient specific, as it seems driven mainly by patient
“su008”, after receiving immunotherapy treatment.
# Normalize data
query.object <- NormalizeData(query.object)
query.object <- ScaleData(query.object)## Centering and scaling data matrix
# Setting up original UMAP space
query.object@reductions[["umap"]] <- CreateDimReducObject(embeddings = cbind(query.object$UMAP1, query.object$UMAP2), key = "UMAP_", assay = DefaultAssay(query.object))
DimPlot(query.object, reduction = 'umap', group.by = 'cluster', label = T)DimPlot(query.object, reduction = 'umap', group.by = 'patient', label = T, repel = T)DimPlot(query.object, reduction = 'umap', group.by = 'treatment')# Filter cells without patient ID to match paper annotation
query.object<- query.object[,which(!is.na(query.object$patient))]Projection into the CD8 reference
As this dataset is a mix between CD4 and CD8 T cells, we will keep
the parameter filter.cells as TRUE.
# Mapping
query.projected <- Run.ProjecTILs(query.object, ref.cd8, split.by = "patient", filter.cells = T, ncores = 4)Let’s check consistency between the original and the projected annotations.
query.projected <- subset(query.projected, subset=cluster %in% c("CD8_act","CD8_eff","CD8_ex","CD8_ex_act","CD8_mem","Naive"))
df <- table(query.projected$cluster, query.projected$functional.cluster)
pheatmap(df, scale = 'row')Annotations globally agree. Still, we can see that some clusters, including CD8_act cluster seems to be mapping into multiple clusters in our reference.
Projections split by original annotations
To understand relationships between previous and our own annotation, we can split the projection by the original cluster annotation provided by the authors. In this case, we can see that CD8_mem are mainly mapped as Central memory cells (CD8.CM), while CD8_act are mapping into many different subtypes.
query.bysub <- SplitObject(query.projected, split.by = "cluster")
pll <- list()
for (n in names(query.bysub)) {
x <- query.bysub[[n]]
pll[[n]] <- plot.projection(ref.cd8, x, linesize = 0.5, pointsize = 0) + ggtitle(n) +
theme(axis.title=element_blank(),
axis.text=element_blank(),
axis.ticks=element_blank(), aspect.ratio = 1)
}
wrap_plots(pll)ggsave("plots/Yost_projected_by_subtype.png", height=6, width=12)We see that CD8_act cluster maps into multiple parts of the reference, as contrary to CD8_eff or CD8_ex for instance.
Radar to plots to check to assess quality
genes4radar <-
c(
'LEF1',
"TCF7",
"CCR7",
"GZMK",
"FGFBP2",
'FCGR3A',
'KLRG1',
"PDCD1",
'ZNF683',
'ITGAE',
"CRTAM",
"CD200",
"HAVCR2",
"TOX",
"ENTPD1",
'SLC4A10',
'TRAV1-2'
)
plot.states.radar(ref.cd8,
query = query.bysub,
min.cells = 0,
genes4radar = genes4radar) # By subtype
for (n in names(query.bysub)) {
pll <- plot.states.radar(ref.cd8, query = query.bysub[[n]], min.cells = 20,
genes4radar = genes4radar, return.as.list = TRUE)
p <- wrap_plots(pll) + plot_annotation(sprintf("Profiles for %s", n))
plot(p)
ggsave(sprintf("plots/Yost_radars_by_subtype.%s.pdf", n), height=9, width=14)
}Zoom on CD8_act in the original UMAP space
Let’s run ProjecTILs in classifier mode
(ProjecTILs.classifier) to check the original UMAP
space.
query.object <- ProjecTILs.classifier(query.object, ref.cd8, split.by = "patient", filter.cells = T, ncores = 4)
DimPlot(query.object, group.by = "functional.cluster", cols = mycols)Now let’s focus on the CD8_act cluster only.
query.object.sub <- subset(query.object, subset = cluster == "CD8_act")
DimPlot(query.object.sub, group.by = "functional.cluster", cols = mycols, repel = T, label = T)## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
DefaultAssay(query.object.sub) <- 'RNA'
FeaturePlot(query.object.sub, features = c('IL7R','FGFBP2','GZMK'), ncol = 3, pt.size = 0.5, order = T, cols = pals::coolwarm()) & NoLegend()We see that in original reduced space, within the activated cluster, we had cell types from our reference, including CM, TEMRA and EM clusters (respectively high for IL7R, FGFBP2 and GZMK).
Second example: Cycling cells program Gueguen et al. 2021
Load data
gueguen.cd3 <- readRDS('~/Dropbox/CSI/Datasets/Gueguen2021/NSCLC_CD3_11tumors.Rds')
gueguen.cd3$seurat_clusters <- Idents(gueguen.cd3)
# Projection
DefaultAssay(gueguen.cd3) <- "RNA"
gueguen.cd3 <- ProjecTILs.classifier(gueguen.cd3, ref = ref.cd8, filter.cells = T, split.by = 'orig.ident', ncores = 6)
table(gueguen.cd3$functional.cluster)##
## CD8.CM CD8.EM CD8.MAIT CD8.NaiveLike CD8.TEMRA
## 2132 3181 147 238 276
## CD8.TEX CD8.TPEX
## 3340 337
DimPlot(gueguen.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)# Radar plots
p <- plot.states.radar(ref.cd8, query = gueguen.cd3, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ZNF683','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1'), return = T)
wrap_plots(p) + theme_bw()Cycling analysis
gueguen.cycling <- subset(gueguen.cd3, subset = seurat_clusters %in% c('CD4/8-TOP2A', "CD4/8-MCM5"))
# Dimplot
DimPlot(gueguen.cycling, group.by = 'functional.cluster', label = T, repel = T, cols = mycols)## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
# barplot
plot.statepred.composition(ref.cd8, gueguen.cycling, metric = "Percent") +
ggtitle("Cycling cells") + ylim(0, 75) + theme_bw() + RotatedAxis()We can see that we can recover the original phenotype of the cycling cells, including Effector Memory (EM), Central Memory (CM), Exhausted (TEX), and Progenitor Exhausted (TPEX). Let’s check the radar plots to see if they match the reference.
# Radar plots
plot.states.radar(ref.cd8, query = gueguen.cycling, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ZNF683','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1',"TOP2A","MCM5")) Indeed, the radars confirm that the annotation seems of good quality. Radar are good way to show that phenotypes match between the reference and the query, but they differ by the cycling genes (TOP2A and MCM5).